Goto

Collaborating Authors

 shrinkage effect



answer some shared concerns from reviewers and then answer their specific questions separately

Neural Information Processing Systems

We appreciate the reviewers for reading our paper and their constructive comments. "I would have liked to have seen comparisons to more fundamental baselines that didn't make the same Reviewer 3: "socialGAN, SoPHie and other multi-agent representation learning approaches should be added..." Reviewer 4: "The paper mentions other approaches and it might be useful to see a comparison to other papers..." "The shot quality prediction is similar to the results reported in ""Quality vs Quantity"... Can the Reviewer 4: "It is unclear that the ladder aspect of the architecture is providing an improvement on this application." Prior work on ice hockey shot prediction does not take into account the identity of the shooter. For instance, the scoring chance is higher for a top player v.s., an average Table 1 shows the benefits of modelling shooter-specific effects. We can discuss the higher levels in the final version. I suspect they might look similar to V aRLAE" Our main contribution is the idea of Player representation through Player Generation (Section 3).


On Multiplicative Multitask Feature Learning

Xin Wang, Jinbo Bi, Shipeng Yu, Jiangwen Sun

Neural Information Processing Systems

We investigate a general framework of multiplicative multitask feature learning which decomposes each task's model parameters into a multiplication of two components. One of the components is used across all tasks and the other component is task-specific. Several previous methods have been proposed as special cases of our framework. We study the theoretical properties of this framework when different regularization conditions are applied to the two decomposed components. We prove that this framework is mathematically equivalent to the widely used multitask feature learning methods that are based on a joint regularization of all model parameters, but with a more general form of regularizers. Further, an analytical formula is derived for the across-task component as related to the taskspecific component for all these regularizers, leading to a better understanding of the shrinkage effect.


On Multiplicative Multitask Feature Learning Xin Wang

Neural Information Processing Systems

We investigate a general framework of multiplicative multitask feature learning which decomposes each task's model parameters into a multiplication of two components. One of the components is used across all tasks and the other component is task-specific. Several previous methods have been proposed as special cases of our framework. We study the theoretical properties of this framework when different regularization conditions are applied to the two decomposed components. We prove that this framework is mathematically equivalent to the widely used multitask feature learning methods that are based on a joint regularization of all model parameters, but with a more general form of regularizers. Further, an analytical formula is derived for the across-task component as related to the taskspecific component for all these regularizers, leading to a better understanding of the shrinkage effect.


High dimensional thresholded regression and shrinkage effect

Zheng, Zemin, Fan, Yingying, Lv, Jinchi

arXiv.org Machine Learning

High-dimensional sparse modeling via regularization provides a powerful tool for analyzing large-scale data sets and obtaining meaningful, interpretable models. The use of nonconvex penalty functions shows advantage in selecting important features in high dimensions, but the global optimality of such methods still demands more understanding. In this paper, we consider sparse regression with hard-thresholding penalty, which we show to give rise to thresholded regression. This approach is motivated by its close connection with the $L_0$-regularization, which can be unrealistic to implement in practice but of appealing sampling properties, and its computational advantage. Under some mild regularity conditions allowing possibly exponentially growing dimensionality, we establish the oracle inequalities of the resulting regularized estimator, as the global minimizer, under various prediction and variable selection losses, as well as the oracle risk inequalities of the hard-thresholded estimator followed by a further $L_2$-regularization. The risk properties exhibit interesting shrinkage effects under both estimation and prediction losses. We identify the optimal choice of the ridge parameter, which is shown to have simultaneous advantages to both the $L_2$-loss and prediction loss. These new results and phenomena are evidenced by simulation and real data examples.